|
 |
 |
To access the contents, click the chapter and section titles.
Oracle Performance Tuning and Optimization
(Publisher: Macmillan Computer Publishing)
Author(s): Edward Whalen
ISBN: 067230886x
Publication Date: 04/01/96
Chapter 19 Data Warehousing System
The concept of the data warehouse has been emerging over the last few years, but with the new, faster computers and the reduction in the cost of disk storage, the data warehouse is finally becoming a reality. This chapter first looks at what a data warehouse is, then at the characteristics of the data warehouse, and then at the data access patterns seen in the data warehouse.
As its name implies, the data warehouse is a storage depot for corporate data. Enormous amounts of data are concentrated in the data warehouse; sources for the data include the following:
- Customer information databases
- Accounts receivable
- General ledger
- Inventory databases
- Customer credit databases
- Other sources
These sources combine to provide a wealth of data about customers and their buying habits as well as information about the general state of your business.
A data warehousing system is similar to a DSS system in some of its functionality, but the scale and focus are different. A typical DSS system focuses on one type of business function; by using its various data-input sources, the data warehousing system may perform much broader business queries.
Data warehousing systems can easily achieve sizes in the hundreds of gigabytes; some systems even break the terabyte barrier. These systems are made possible by the continuing trend of computer hardware to increase in speed while decreasing in price. In the near future, it may not be uncommon to see tables of a terabyte in size.
Although the cost of data warehousing hardware is decreasing, it is still out of reach of the mainstreamfor now. As we go further into the information age and the value of information is better understood, I believe data warehousing will become more mainstream.
Characteristics of a Data Warehouse
Here are some of the characteristics of the data warehousing system:
- Queries against large volumes of data. The data warehousing system consists of much more data than the typical OLTP or DSS system.
- Queries exhibit a variety of access patterns. Queries may be simple or quite complex, with complicated joins and aggregations on large amounts of data.
- Highly complex queries. Queries on a data warehouse are typically much more complex than those used in OLTP and DSS systems.
- Data access stresses the system to its limitations. The processes stretch the system in terms of both performance and capacity.
- Intense load activity. As data from various sources is entered into the data warehousing system, the load increases dramatically.
- Is a conglomeration. A data warehouse is a compilation of various input sources, usually tied into the corporate OLTP databases and other sources.
The load on the data warehousing system is typically very high. As with the DSS system, because users do not typically use a data warehousing system for online processing, it is reasonable to push the system to its limits.
It is not uncommon for the decision support queries run against the data warehousing system to take hours or even days to complete. The queries are complex and the amount of data being queried is enormous. These systems are optimized for throughput rather than for response times. By maximizing throughput, some jobs may suffer in terms of response time.
If you think that the data warehousing system is just a glorified DSS system, you are partially correct. The data warehouse may just be the next step in the evolution of the DSS system. There are many similarities, but there are many differences as well.
Data Access Patterns
The data access patterns seen in a data warehouse are fairly similar to those seen in a DSS system. Based on the types of transactions you generate, you should be able to fairly accurately determine these patterns. Although each system has its own specific data access patterns, the data warehousing system has the following, general characteristics:
- Redo log activity is moderate to high. Unlike the DSS system (where the redo log activity is very low), the redo log activity for a data warehouse may be moderate or even high. This is caused, not by the activity of the business transactions, but by the procedures necessary to prepare and load the data. The metadata may be constantly put together from many external sources.
- Archiving activity is moderate to high. As with the redo logs, there may be significant activity due to the conglomeration of data stored in the data warehouse.
- Data access for each query is mostly sequential. Because the queries usually extract large amounts of data from the tables, full-table scans are not uncommon.
- Data reads can (and frequently do) take advantage of multiblock reads. You can expect that many of the disk accesses are the size of multiblock reads.
- Access to the data files is somewhat random. As with the DSS system, this random access is caused by contention with other transactions and join and indexing operations that result in fairly random access across the data volumes. However, these random reads from the disk drives access the data with a much larger data request.
- Data access may be sparse. Because of the massive amounts of data being stored, there may be pockets of data infrequently accessed and other pockets that see much more frequent access.
- Heavy access to the temporary tables. Because of the typical size of many of the join and sort operations, the temporary tables are hit hard. Remember that only the sorts that use less memory than SORT_AREA_SIZE are in memory.
- Data access may not be distributed evenly. Because of the massive amount of data stored in the warehouse, it is not uncommon for queries to use only isolated pieces of data.
Although these patterns vary depending on how your system operates, the general principles are the same. The access patterns to your tables vary based on how often and how much is done to each table.
|